Disfluency detection based on prosodic features for university lectures
نویسندگان
چکیده
This paper focuses on the identification of disfluent sequences and their distinct structural regions, based on acoustic and prosodic features. Reported experiments are based on a corpus of university lectures in European Portuguese, with roughly 32h, and a relatively high percentage of disfluencies (7.6%). The set of features automatically extracted from the corpus proved to be discriminant of the regions contained in the production of a disfluency. Several machine learning methods have been applied, but the best results were achieved using Classification and Regression Trees (CART). The set of features which was most informative for cross-region identification encompasses word duration ratios, word confidence score, silent ratios, and pitch and energy slopes. Features such as the number of phones and syllables per word proved to be more useful for the identification of the interregnum, whereas energy slopes were most suited for identifying the interruption point.
منابع مشابه
Comparing Different Machine Learning Approaches for Disfluency Structure Detection in a Corpus of University Lectures∗
This paper presents a number of experiments focusing on assessing the performance of different machine learning methods on the identification of disfluencies and their distinct structural regions over speech data. Several machine learning methods have been applied, namely Naive Bayes, Logistic Regression, Classification and Regression Trees (CARTs), J48 and Multilayer Perceptron. Our experiment...
متن کاملProsodic contex-based analysis of disfluencies
This work explores prosodic cues of disfluencies in a corpus of university lectures. Results show three significant (p < 0.001) trends: pitch and energy slopes are significantly different between the disfluency and the onset of fluency; those features are also relevant to disfluency type differentiation; and they do not seem to be a speakereffect. The best combination of linguistic features one...
متن کاملProsodic context-based analysis of disfluencies
This work explores prosodic cues of disfluencies in a corpus of university lectures. Results show three significant (p < 0.001) trends: pitch and energy slopes are significantly different between the disfluency and the onset of fluency; those features are also relevant to disfluency type differentiation; and they do not seem to be a speakereffect. The best combination of linguistic features one...
متن کاملSpontaneous Mandarin Speech Recognition with Disfluencies Detected by Latent Prosodic Modeling (LPM)
In this paper, a new approach for improved spontaneous Mandarin speech recognition using Latent Prosodic Modeling (LPM) for disfluency interruption point (IP) detection is presented. The basic idea is to detect the disfluency interruption points (IPs) prior to the recognition, and then to incorporate these information into the recognition process via the second pass rescoring. For accurate dete...
متن کاملAnalysis of disfluencies in a corpus of university lectures
This paper analyzes the prosodic properties of disfluencies and of their contexts in a corpus of university lectures. Results show that there is a general tendency to repair fluency by means of prosodic contrast marking strategies (pitch and energy increase), regardless of the specific disfluency type, but still there are degrees in the contrast made by certain types. As for tempo patterns, the...
متن کامل